Modeling and Extracting Deep-Web Query Interfaces
نویسندگان
چکیده
Interface modeling & extraction is a fundamental step in building a uniform query interface to a multitude of databases on the Web. Existing solutions are limited in that they assume interfaces are flat and thus ignore the inherent structure of interfaces, which then seriously hampers the effectiveness of interface integration. To address this limitation, in this chapter, we model an interface with a hierarchical schema (e.g., an ordered-tree of attributes). We describe ExQ, a novel schema extraction system with two distinct features. First, ExQ discovers the structure of an interface based on its visual representation via spatial clustering. Second, ExQ annotates the discovered schema with labels from the interface by imitating the human-annotation process. ExQ has been extensively evaluated with real-world query interfaces in five different domains and the results show that ExQ achieves above 90% accuracy rate in both structure discovery & schema annotation tasks.
منابع مشابه
Describing the Semantic Relation of the Deep Web Query Interfaces Using Ontology Extended LAV
The key element in a Deep Web information fusion system is the data source modeling problem, which is the determinant technical factor of the whole system. The query interfaces provided by the Deep Web are the clues to disclose the hidden schemas. But the complicated semantic relationships in the query interfaces lead to the lower generality and ability of local as view (LAV) method in the trad...
متن کاملOntology Based Automatic Attributes Extracting and Queries Translating for Deep Web
Search engines and web crawlers can not access the Deep Web directly. The workable way to access the hidden database is through query interfaces. Automatic extracting attributes from query interfaces and translating queries is a solvable way for addressing the current limitations in accessing Deep Web. However, the query interface provides semantic constraints, some attributes are co-occurred a...
متن کاملDeep Web Content Mining
The rapid expansion of the web is causing the constant growth of information, leading to several problems such as increased difficulty of extracting potentially useful knowledge. Web content mining confronts this problem gathering explicit information from different web sites for its access and knowledge discovery. Query interfaces of web databases share common building blocks. After extracting...
متن کاملA Hierarchical Approach to Model Web Query Interfaces for Web Source Integration
Much data in the Web is hidden behind Web query interfaces. In most cases the only means to “surface” the content of a Web database is by formulating complex queries on such interfaces. Applications such as Deep Web crawling and Web database integration require an automatic usage of these interfaces. Therefore, an important problem to be addressed is the automatic extraction of query interfaces...
متن کاملVIQI1: A New Approach for Visual Interpretation of Deep Web Query Interfaces
Deep Web databases contain more than 90% of pertinent information of the Web. Despite their importance, users don’t profit of this treasury. Many deep web services are offering competitive services in term of prices, quality of service, and facilities. As the number of services is growing rapidly, users have difficulty to ask many web services in the same time. In this paper, we imagine a syste...
متن کامل